16
This is a relatively quick prediction, and the three-dimensional coordinates are then
available for the user to download. However, it requires a protein with a known three-
dimensional structure as a template in order to calculate how much the user’s sequence
differs from this in its three-dimensional structure. Whether a template can be found is
determined by a special sequence comparison with the proteins in the SWISS-MODEL
database.
SWISS-MODEL is a very solid, fast and often confirmed approach to determine a
three-dimensional structure according to protein template. However, there are many other,
often much more complex ways of calculating the protein structure (e.g. homology model
ling with MODELLER):
c
c
https://salilab.org/modeller/tutorial/
Since structures are not always available that can serve as a template, so-called ab initio
and optimization algorithms calculate an approximate solution for the structure determi
nation based on the sequence and the minimization of the free enthalpy. Prominent repre
sentatives here are neural networks, evolutionary algorithm or Monte Carlo simulation.
One example is the QUARK server from the Zhang lab:
c
c
https://zhanglab.ccmb.med.umich.edu/QUARK/
Marking of the Known Structural Parts in the Protein Sequence
For independent verification, we offer at the chair a labeling of the known three-dimensional
structural domains to any sequence (the technical language says domain annotation, that is
why our tool is called “AnDom”). This is a slightly different procedure and works for any
sequence. It just looks to see if at least a small piece of the sequence is not similar to a
known three-dimensional protein structure. Thus, it is completely independent of the
ExPASy predictions and can check them. In general, independent databases and softwares
from different authors and methods check each other. This allows to significantly increase
the quality of the predictions, e.g. to collect all structure predictions (broad search) or to
accept only those found by both websites (particularly validated predictions).
This then sometimes makes the predictions a bit tight. This happens when only short
parts of the sequence have sufficient similarity to the structural databases that AnDom has.
It can also happen that the protein structure is new, i.e. not similar enough to any known
structure to allow prediction. Just as when using BLAST, very small random expectation
values (1 in one million and lower probabilities) mean that the assignment using AnDom
has been very successful in revealing a structure similarity. In contrast, a random similarity
can be recognized by a high random hit rate (higher than 1 in 1000). It may even happen
that such a small similarity is found several times even by a random sequence. In this case,
the expected value is e.g. 4, if on average a random sequence would find four such hits in
the AnDom structure database.
1 Sequence Analysis: Deciphering the Language of Life